thoughtful review and constructive suggestion
Many thanks to the reviewers for their deep, thoughtful reviews and constructive suggestions
We note that despite very recent observations on empirical superiority of adaptive synchronization (e.g., Surely, it would be interesting to see if our bound can be tightened. R1. log T communication rounds clarification: However, for local SGD with periodic averaging the proof techniques are more involved. We do not tune the learning rate.